Add -fno-reciprocal-math -mrecip=none to Ofast to solve the AMD/Intel differences#8280
Conversation
|
A new Pull Request was created by @silviodonato (Silvio Donato) for branch IB/CMSSW_13_0_X/master. @cmsbuild, @smuzaffar, @aandvalenzuela, @iarspider can you please review it and eventually sign? Thanks. |
|
I suggest to run test using "enable profiling" (I think I don't have anymore the rights to start tests) |
|
test parameters:
|
|
test parameters:
|
|
please test |
|
This PR replaces cms-sw/cmssw#40649 |
|
Is this making the change globally? |
|
Hi everyone, the test was performed using |
|
cc @goys |
|
@Martin-Grunewald , yes, afaik this change should affect all modules, but I let people from @cms-sw/core-l2 to comment more on this. These are the PRs which moved all packages from |
|
@silviodonato @Martin-Grunewald , change will only apply to those packages which directly have |
|
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2deb6c/30324/summary.html Comparison SummarySummary:
|
|
I'm surprised to see so many differences even at simulation level in (eg. 13234.0 and 13434.0, see https://tinyurl.com/27gnfjt9 ), do you think they are just statistical fluctuation of the PU mixing? |
|
please test lets rerun the tests based on latest IB. There were a lot of other changes which were accumulated for last tests |
|
The timing seems ok, but I let @cms-sw/reconstruction-l2 comment more on this |
What is the impact on the HLT timing, running on 2022 data ? |
|
@cms-sw/reconstruction-l2 |
|
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2deb6c/30351/summary.html The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: Comparison SummarySummary:
|
As mentioned in yesterday's ORP, HLT would like to have this PR merged in time for @cms-sw/reconstruction-l2 , do you have any further comments ? (reco already signed in #8280 (comment), but that was before the latest round of tests.) |
|
+reconstruction |
|
This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_13_0_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2) |
|
@perrotta, @rappoccio this is ready to go in , feel free to merge it for tonight's IB |
|
OK, I am merging it for tonights IB |
|
+1
|
|
@cms-sw/core-l2 @perrotta @rappoccio I think this PR caused the failure of the build in ARM ( I hope this can be solved with something similar to 70f1969 |
|
#8302 should fix the aarch64 issue. |
|
For reference, @gparida run in 13_0_0_pre4 the comparison Intel vs AMD and he found no difference as expected: |
-Ofast implies -funsafe-math-optimizations, which affects the global FP state for all programs: when used at link time, it may include libraries or startup files that change the default FPU control word or other similar optimizations. -Ofast enables (directly or indirectly): - ❌ -fallow-store-data-races - ✔ -fno-semantic-interposition - ✔ -fno-math-errno - ❌ -funsafe-math-optimizations - ❌ -ffinite-math-only - ✔ -fno-rounding-math (_default_) - ✔ -fno-signaling-nans (_default_) - ❔ -fcx-limited-range - ✔ -fexcess-precision=fast - ✔ -fno-signed-zeros - ✔ -fno-trapping-math - ✔ -fassociative-math - ❌ -freciprocal-math (cms-sw#8280) We should not use -fallow-store-data-races, -ffinite-math-only. We should not use -funsafe-math-optimizations at link time, as it affects the global FP state. It's probably easier to replace it with the individual options -fno-signed-zeros, -fno-trapping-math and -fassociative-math. We may revisit -fcx-limited-range if we do use Complex arithmetics. -freciprocal-math was already disabled explicitly, see cms-sw#8280.
-Ofast implies -funsafe-math-optimizations, which affects the global FP state for all programs: when used at link time, it may include libraries or startup files that change the default FPU control word or other similar optimizations. -Ofast enables (directly or indirectly): - ❌ -fallow-store-data-races - ✔ -fno-semantic-interposition - ✔ -fno-math-errno - ❌ -funsafe-math-optimizations - ❌ -ffinite-math-only - ✔ -fno-rounding-math (_default_) - ✔ -fno-signaling-nans (_default_) - ❔ -fcx-limited-range - ✔ -fexcess-precision=fast - ✔ -fno-signed-zeros - ✔ -fno-trapping-math - ✔ -fassociative-math - ❌ -freciprocal-math (disabled in cms-sw#8280) We should not use -fallow-store-data-races, -ffinite-math-only. We should not use -funsafe-math-optimizations at link time, as it affects the global FP state. It's probably easier to replace it with the individual options -fno-signed-zeros, -fno-trapping-math and -fassociative-math. We may revisit -fcx-limited-range if we do use Complex arithmetics. -freciprocal-math was already disabled explicitly, see cms-sw#8280.
About one year ago we discovered differences of few percents between AMD and Intel in few HLT paths when rerunning the HLT [1]. The situation got worse when we added the parking di-electron paths with dZ path, which gave a difference of ~8% [2].
After several failed attempts, @gparida managed to solve the differences following @VinInn suggestion [3] to use
-Ofast -fno-reciprocal-math -mno-recip.@gparida replaced
<use name="ofast-flag"/>with<flags CXXFLAGS="-Ofast -fno-reciprocal-math -mrecip=none"/>in allBuildFiles.xml(excepttestfolders to save compilation time)and found no AMD/Intel differences @gparida please share your results here (which release did you use?).
So I made this PR to implement this change directly at
cmsdistlevel(but I didn't test it compiling the externals).
I don't know exactly the impact on timing, but I assume it will be much smaller than 1%, that was the impact of the full removal of Ofast [4]
[1] https://its.cern.ch/jira/browse/CMSHLT-2257
[2] https://docs.google.com/spreadsheets/d/1XRu1sfIYJUQ0e7Z_yJWdGrxTDA31zr33uWPFY54-APc
[3] https://its.cern.ch/jira/browse/CMSHLT-2257?focusedCommentId=4686018&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-4686018
[4] cms-sw/cmssw#40089
cc @cms-sw/hlt-l2 @cms-sw/reconstruction-l2 @gparida @sanuvarghese @cms-sw/core-l2